SensEURCity: A multi-city air quality dataset collected for 2020/2021 using open low-cost sensor systems

Low-cost air quality sensor systems can be deployed at high density, making them a significant candidate of complementary tools for improved air quality assessment. However, they still suffer from poor or unknown data quality. In this paper, we report on a unique dataset including the raw sensor data of quality-controlled sensor networks along with co-located reference data sets. Sensor data are collected using the AirSensEUR sensor system, including sensors to monitor NO, NO2, O3, CO, PM2.5, PM10, PM1, CO2 and meteorological parameters. In total, 85 sensor systems were deployed throughout a year in three European cities (Antwerp, Oslo and Zagreb), resulting in a dataset comprising different meteorological and ambient conditions. The main data collection included two co-location campaigns in different seasons at an Air Quality Monitoring Station (AQMS) in each city and a deployment at various locations in each city (also including locations at other AQMSs). The dataset consists of data files with sensor and reference data, and metadata files with description of locations, deployment dates and description of sensors and reference instruments.


Background & Summary
Air quality remains a major concern in many parts of Europe, especially in urban areas 1 . The most important air pollutants in terms of health are particulate matter (PM), nitrogen dioxide (NO 2 ), and ground-level ozone (O 3 ).
For the efficient implementation of air policies, air quality monitoring data with high spatial density and temporal resolution, and with sufficient quality are needed; These data can supplement data from Air Quality Monitoring Stations (AQMSs) that are used to assess the ambient air quality in Europe as defined in the Directive 2008/50/EC. Low-cost air quality sensor systems consist of an integrated set of hardware that uses one or more sensors to measure the quantity of a chemical species and can supply real time measurements 2 .
Thanks to their lower cost than the reference air quality monitoring methods 3 sensor systems can be deployed at high density, making them a significant candidate of complementary tools for improved air quality management. However, they still suffer from poor or unknown data quality 3 . Sensor signals can be affected by interfering compounds, temperature, humidity, pressure and signal drift over time [4][5][6] .
The European Commission -Joint Research Centre (JRC) has recently conducted a research project to evaluate low-cost sensor system, namely the AirSensEUR, as a supplemental tool for reference air quality monitoring. The AirSensEUR sensor system ( Fig. 1) contains sensors to monitor NO, NO 2 , O 3 , CO, PM 2.5 , PM 10 , PM 1 , CO 2 and meteorological parameters (temperature, relative humidity, and atmospheric pressure). Within this project, the air quality data from the 85 AirSensEURs and partially from the Air Quality Monitoring Stations github.com/ec-jrc/airsenseur-sensorshost, https://github.com/ec-jrc/airsenseur-box, https://github.com/ec-jrc/ airsenseur-box).
The data collected from the units were stored locally and periodically sent to external server to an InfluxDB database 7 for offline post-processing and/or calibration. The data were generally transferred via GPRS/LTE and via WiFi connections for a few units.
AirSensEUR includes a PTFE enclosure with a size of 26 cm × 22 cm × 10 cm and a weight of 2 kg, battery included (see Fig. 1 top). The PTFE enclosure was inserted in a stainless-steel protecting cover. The overall size of protective stainless-steel cover is 35 cm × 32 cm × 30 cm except for the top cover, which is made from a 42 cm × 45 cm aluminium plate (see Fig. 1 bottom). Table 1 gives an overview of the measured pollutants, the sensor type and manufacturer. The OPC-N3 has 24 size bins (0.3/0.35-40 µm) with a counting efficiency of 50% @ 0.3 μm and 100% @ 0.35 μm and the PMS5003 has 6 size bins (>0.3 µm) with a counting efficiency of 50% @ 0.3 µm and 100% @ 0.5 µm. Both counting efficiencies were claimed by the manufacturers although it was shown by experiments that the counting efficiency for the PMS5003 sensor is about 80% at 0.5 µm 10 . No publications on counting efficiencies of OPC-N3 are available to our knowledge.
The gas sensors for NO 2 , CO, NO and O 3 are installed on the AirSensEUR Chemical sensor Shield (version R31), PM and CO 2 sensors are installed on the Exp1Shield R10 sensor shield 7 . In addition, the sensor box is equipped with sensors for monitoring temperature and relative humidity inside the AirSensEUR box nearby chemical sensors and other sensors for monitoring ambient air temperature, relative humidity and atmospheric pressure outside the AirSensEUR box on a Flyboard. The information presented in Table 1 is also given in the metadata file metadata_sensors.csv 11 . Sensor box sampling periods. The data were collected between April 2020 and April 2021. The exact sampling intervals in each city were slightly different. An overview of the timeline of the sampling is given in Fig. 2. The Figure shows the dates of the feasibility study in Ispra, the pilot studies in the cities, the first co-location in the cities, the deployment at different sites in the cities and second co-location in the cities. Details on the sampling is given in the paragraph below. A detailed overview of the start and stop dates at each location is given in metadata file (File metadata_dates.csv 11 ).

Co-locaƟon site Oslo
Co-locaƟon site Zagreb * f.s.: feasibility study including small scale co-locaƟon in Ispra including ten sensor boxes ** pilot in the ciƟes including ten sensor boxes Fig. 2 Overview of the sampling with sampling site locations and timeline. Map data ©2022 Google Imagery ©2022 Nasa, Terrametrics.
Prior to the sampling campaigns in the cities, selected sensor systems were deployed at the EMEP-ABSIS-ICOS station of the JRC in Ispra (IT) as an initial feasibility study. Subsequently, the pilot studies were performed prior to the sampling campaigns in each city with the same ten boxes of the feasibility study, to test data transfer, installation, etc. In each city, the sampling campaigns included three consecutive phases: • co-location of all sensor systems at an AQMS in the city (hereafter called 'first co-location') • deployment of the sensor systems at different locations of the same city (hereafter called 'deployment') • co-location of all sensor systems at the same AQMS of the first collocation (hereafter called 'second co-location') Feasibility study in Ispra and pilot studies in the three cities (Antwerp, Oslo and Zagreb) prior to the main co-locations and deployment campaigns. Ten sensor systems were installed at the EMEP-ABSIS-ICOS station of the JRC in Ispra (IT), a semi-rural site in Northern Italy 12 , between 17 and 31 January 2020 (Fig. 2). The same ten systems were used in the pilot study in the three cities: four of the sensor systems were deployed in Antwerp (40641B, 4065D0, 4065E0 and 4065E3) and in Oslo (40458D, 40642E, 4065ED and 40816F) and two in Zagreb (4047D0 and 406414). The purpose of this study was to check the reliability of AirSensEUR sensor systems as well as to collect data for calibration at a semi-rural site. The characteristics of the reference air pollution analysers and meteorological parameters at the EMEP-ABSIS-ICOS station are given in Table 2. The gas analysers were routinely calibrated, and daily calibration checks were performed to detect and correct possible drifts of the monitoring equipment. The sensors systems used in the feasibility study in Ispra were also included into initial pilot studies in Antwerp, Oslo and Zagreb before the first co-location in order to check the correct deployment and operation at a few field sites (see Fig. 2 and file metadata_dates.csv 11 ).
Common naming-convention for sampling site description. A common naming-convention for sampling site description for the three consecutive phases in the three cities is used. The sampling site labels (IDs) are of the form is XXX_YYY_ZZZ(Z) with: • The XXX referring to the city: ANT (Antwerp); OSL (Oslo); ZAG (Zagreb); • The YYY describing the type of location: URB (Urban background or suburban background); TRA (Traffic site in urban or suburban area), RUR (Rural site), REF (AQMS with reference measurements, without any further characterisation); • The three or four ZZZ(Z) referring to the street name of location, or the name of AQMS. www.nature.com/scientificdata www.nature.com/scientificdata/ Main co-locations and deployment campaigns in antwerp. The sensor systems installation at the AQMS 42R801 of Borgerhout for the first co-location took place on 2020-04-02 and 2020-04-03, where the sampling lasted roughly until 2020-06-05 (about 72 days). Between 2020-06-15 and 2020-06-18, the sensor systems were moved to their deployment sites, apart from two units that were installed on 2020-06-22. The sensor systems stayed at the deployment sites for approximately 8 months. Between 2021-02-17 and 2021-02-26 the sensor systems were taken down from their deployment locations and installed at the same AQMS for the second co-location lasted until 2021-04-13 (lasted about 45 days). A detailed overview of the start-and stop dates at each location (deployment sites) is given in metadata file (File metadata_dates.csv 11 ) and is visually displayed in Fig. 4.

Main co-locations and deployment campaigns in Oslo.
The sensor systems involved in the first co-location exercise were installed at the Kirkeveien AQMS between 2020-08-26 and 2020-08-28, which the sampling lasted roughly until 2020-10-14 (about 48 days), except for two sensor systems (4065ED and 40458D) that stayed at the pilot sites (OSL_TRAF_VINK and OSL_TRAF_LIND). Subsequently, all units except for two, were moved to their deployment sites. The installation at the deployment sites started on 2020-10-16, and by 2020-12-01, most sensor systems were operational until 2021-03-08 (roughly 88 days). The installation of sensor systems for the second co-location took place on 2021-03-08 and 2021-03-10, which lasted until 2021-04-09 (roughly 31 days). One sensor system collected data over a very limited period. A detailed overview of the start-and stop dates at each location (deployment sites) is given in metadata file (File Metadata_dates.csv 11 ) and is visually displayed in Fig. 6.
Main co-locations and deployment campaigns in Zagreb. The sensor systems installation for the first co-location at the IMI AQMS took place on 2020-05-18 and 2020-05-27, and co-location lasted roughly until 2020-07-15 (around 60 days). The deployment period was roughly between 2020-07-20 and 2021-02-18 (approximately 7 months). The second co-location lasted roughly between 2021-03-03 and 2021-04-12 (approximately 37 days). A detailed overview of the start-and stop dates at each location (deployment sites) is given in metadata file (File metadata_dates.csv 11 ) and is visually displayed in Fig. 8.

Parameters
Technique used  www.nature.com/scientificdata www.nature.com/scientificdata/ Co-location sites. The details of the AQMSs where the co-location campaigns took place including measured pollutants and reference analysers at each AQMS are given in Tables 2. In Oslo (OSL_REF_KVN), the Palas Fidas 200 data is reported only during the 1 st co-location and during the 2 nd co-location, the data from the TEOM instrument is reported instead of the Palas Fidas 200 data. For naming convention of the test sites in the Table 2, we refer to section "Sensor locations: deployment sites".   The reference monitoring at ANT_REF_R801 includes PM 10 , PM 2.5 , NOx (NO 2 and NO), CO, CO 2 and O 3 , see Table 2. The reference station also includes SO 2 , BC and UFP monitoring. CO and CO 2 monitoring are not permanently performed. During the two co-locations, one CO and one CO 2 monitors were temporally installed at the station (same inlet as other gases). The list of AirSensEUR sensor systems co-located at the ANT_REF_ R801 station are given in metadata_dates.csv 11 .   www.nature.com/scientificdata www.nature.com/scientificdata/ Co-location site in Oslo. The Kirkeveien AQMS (OSL_REF_KVN), located at 10.72455°E, and 59.93230°N at an altitude of 58.3 m, is a traffic station situated next to an urban ring road with an average daily traffic intensity of ca. 15,000 vehicles.
The reference monitoring at OSL_REF_KVN normally includes PM 10 , PM 2.5 , CO, NO and NO 2 . An O 3 monitor was additionally installed at the station for two co-location campaigns. In addition to two TEOM PM monitors with PM 2.5 and PM 10 inlets, a Palas Fidas 200 instrument was also operational during the 1st co-location campaign. The list of AirSensEUR sensor systems co-located at the OSL_REF_KVN station are given in meta-data_dates.csv 11 .
Co-location site in Zagreb. The IMI AQMS (ZAG_REF_IMI), located at 45.835305°N, 15.977822°E, at an altitude of 195 m, is an urban background station within the Zagreb network for air quality monitoring.
The reference monitoring at ZAG_REF_IMI includes PM 10 , PM 2.5 , NOx (NO 2 and NO), CO, O 3 , SO 2 and benzene. The list of AirSensEUR sensor systems co-located at the ZAG_REF_IMI station are given in meta-data_dates.csv 11    The deployment sites were selected to assure a good spatial coverage over each city as well as a suitable distribution between background, traffic and AQMSs. The deployment sites are characterised by different impact of traffic: both in terms of traffic density as well as distance to the street. Most sampling sites are at other locations than an AQMS (further referred to as 'dedicated sites'). Some sites were selected very close to each other (neighbouring sensors) with variation in traffic density to assess short-term spatial variability. Some sensor systems were installed at the AQMSs to check the agreement between sensors and reference analysers over a longer period than the co-location periods. At some of the AQMSs, the duplicate systems were deployed to evaluate the between sensor variances. In total, three, three and one duplicate sensor systems were installed respectively in Antwerp, Oslo and Zagreb. The duplicate sensor systems were respectively: 40499 C -4043A7, 4049A6 -4067BD, 4043AE -4067B3 for Antwerp; 40642E -64FD11, 64E9C5 -65063E,649526 -42816E for Oslo and 4047D0 -427907 for Zagreb. The corresponding locations of these boxes during the deployment are given in metadata_dates.csv and the locations are described in metadata_sites.csv 11 . The number of sensor systems installed in each city and the distribution over AQMSs and 'dedicated' locations is given in Table 3.
The file metadata_sites.csv 11 file contains the metadata of sampling sites in the three cities with the distances to road and an indication of traffic intensity. For Antwerp, the traffic intensity (vehicles per hour) was based on the modelled data of Department MOW (Mobiliteit en Openbare werken or Mobility and Public works), calculated from the annually averaged traffic density over all hours of the day, and therefore the actual vehicle numbers during peak hours may be much larger. For Oslo, the daily averaged traffic density was initially estimated using a traffic model and, then it was converted to averaged hourly traffic density to be consistent with the Antwerp data. For Zagreb, the quantitative traffic density information was not available, instead, the qualitative estimates were provided.
Detailed information of each sampling site is given as a pdf file (metadata_sampling_site_description.pdf 11 ).
Conditions during co-location and deployment. The ambient conditions and concentrations during the co-location campaigns and deployment showed broad ranges.
Meteorological conditions. A broad range of atmospheric conditions was covered during co-location and deployment in the three cities. An overview of the conditions is summarized in Table 4. During the co-location periods, the hourly temperature ranged between −1 and 28 °C in Antwerp and between −3 and 26 °C in Oslo. The daily averages in Zagreb were between 0 and 26 °C. The hourly relative humidity ranged between 25 and 100% in Oslo, between 21 and 99% in Antwerp and between 44 and 89% in Zagreb (daily values).
During the deployment, the hourly temperature ranged between −7 and 39 °C in Antwerp and between −14 and 12 °C in Oslo. The daily averages in Zagreb were between −3 and 27 °C. The hourly relative humidity ranged between 29 and 100% in Oslo, between 20 and 100% in Antwerp and between 33 and 97% in Zagreb (daily values).
The deployment in Oslo did not cover the summer period, resulting in a narrower range in atmospheric conditions.  Table 3. Total number of AirSensEUR sensor systems and their distribution at the AQMSs and dedicated sites during the deployment in the three cities. **In addition, two sensor systems were installed at the AQMSs which are not automatic stations but have filters or other sampling techniques (data of these stations are not used and therefore not classified as reference station).   Notable differences in concentrations were observed between the two co-locations and deployment periods as well as between the cities.

Data Records
The data are publicly available and can be freely accessed from Zenedo 11 . The dataset consists of the data files (Directory dataset) and metadata (Directory metadata).

CO_B4_P1, NO_B4_P1, NO2_ B431F_P1, OX_A431_P1
raw sensor data for the CO-A4, NO-B4, NO2_B43F and OX-A431 sensors, respectively in nA measured at the sensor working electrodes Decimal number  LocationID when AirSensEUR box are co-located at any Air Quality Monitoring Station (AQMS), the LocationID identify the sampling with the Location ID given in Table 9 and metadata_sites.csv string BMP280 atmospheric pressure in hPa measured by the Bosch Sensortec BMP280 sensor. The sensor is located on the chemical sensor shield R31, see Fig. 1, top left Decimal number

SHT31HE, SHT31TE
ambient relative humidity in % and temperature in °C, respectively, measured by the Sensirion SHT31-DIS-B sensor with filter membrane SF2. The sensor is located on a flying board directly sensing air with as little as possible influence from electronic heat and temperature of the stainless-steel protective box.

Absolute_humidity
Calculated quantity, the mass of water vapour in ambient air to the volume occupied by the air mixture in g/m³ that is, the concentration of water vapour that was computed using the Clausius-Clapeyron equation 23 .

Td_deficit
Calculated quantity, the difference between the ambient air temperature and the dew point in °C. The dew point was computed using the Magnus equation 24 . Decimal bumber

SHT31HI, SHT31TI
relative humidity in % and Temperature in °C, respectively, measured by the Sensirion SHT31-DIS-B sensor. The sensor was located on the sensor shield R31, nearby the electrochemical sensors Decimal number www.nature.com/scientificdata www.nature.com/scientificdata/ Sensor and reference data. One data file is supplied for each AirSensEUR sensor system in csv format comma separated without quotes. The naming-convention of the data files is given as "City_ASE_ID.csv", where City corresponds to the city where the AirSensEUR sensor systems were deployed, ASE stands for AirSensEUR and ID is a unique identifier of each sensor system. The data files are given in wide format with one row of data for each minute when the AirSensEUR sensor systems recorded any data of any sensors. Within each row, any missing data is reported with an empty field. Each row includes minute raw sensor data, reference data, meteorological data (temperature, relative humidity and atmospheric pressure), date, time and location. The column headers present in datasets are listed in Tables 5-7 with description and units for sensor and reference data. The datasets also include quality flags for sensor data as described in section Technical Validation.
In addition to the mass concentrations, particle numbers per bin of Palas Fidas 200 are supplied for the colocation site ANT_REF_801 during the 2 co-location periods and during deployment period in Antwerp (ANT_REF_R801_FIDAS_UTC.csv) and at OSL_REF_KVN in Oslo during the first co-location period (OSL_ REF_KVN_Fidas_UTC.csv). The files are comma separated with minute data. The content of these files is described in Table 8. Missing data and invalid data are indicated with empty cell while 0 indicates no particle counts.

Metadata.
Five metadata files are provided to describe: • the sensor used in the AirSensEUR sensor systems (metadata_sensors.csv); • the brand name of reference analysers (metadata_sites.csv) used at all sampling sites; • additional data of the sampling sites, including e.g. location description, positioning of the sensor systems, picture of deployment (metadata_sampling_site.pdf); • the sampling dates of the feasibility study, pilot studies, first and second co-location and deployment for all AirSensEUR sensor systems (metadata_dates.csv); • and the diameters of particles associated with each bin (metadata_Fidas_um.csv). Ref.CO_ppm reference CO measurements data, the reference analyser and unit are given in Table 2 Decimal number

Ref.CO2
reference CO 2 measurements data, the reference analyser and unit are given in Table 2 Decimal number

Ref.NO
reference NO measurements data, the reference analyser and unit are given in Table 2 Decimal number

Ref.NO2
reference NO 2 measurements data, the reference analyser and unit are given in Table 2 Decimal number

Ref.O3
reference CO measurements data, the reference analyser and unit are given in Table 2 Decimal number

Ref.PM10
reference PM 10 measurements data, the reference analyser and unit are given in Table 2 Table 7. Description of air pollutant reference data present in all datasets of the AirSensEUR sensor systems with their coordinates, temperature, relative humidity, and atmospheric pressure at their location. Note 1: in Oslo, all reference PM 10 and PM 2.5 mass concentrations given by FIDAS 200(S) and TEOM instruments were normalised using the slope and intercept of the regression line of daily TEOM and FIDAS data against daily data given by low volume samplers. Note 2: In Antwerp during the 2 nd co-location, the reference temperature and reference relative humidity were given by the FIDAS 200 at site ANT_REF_R801 because the official meteo station was not operative anymore. It was checked that both temperature and relative humidity of the meteo station and FIDAS 200 correctly agreed during the 1 st colocation, with R² = 0.99, slope = 1 and intercept = 0.3 for temperature and R² = 0.94, slope = 0.95 and intercept = −1.2 for relative humidity.  www.nature.com/scientificdata www.nature.com/scientificdata/ The description of metadata in metadata_sensors.csv file is given in Table 1. The description of metadata in metadata_sites.csv file is given in Table 9. The description of metadata in metadata_dates.csv is given in Table 10.

technical Validation
Quality assurance/control (QA/QC) procedures. During deployment sensor systems were regularly checked. In some cases, sensors had to be replaced or cleaned. For the reference data, common QA/QC procedures were applied consistent with the objectives of the European air quality directive (2008/50/EC) and conform with internationally accepted standards (EN ISO/IEC 17025); this means that the reference monitors at AQMS are serviced and calibrated on a regular basis and measurement uncertainties meet the Data Quality Objectives of the European air quality directive (2008/50/EC  10 16 ) were applied. In this paper, inconsistent sensor data were flagged. Data were flagged when certain threshold values are exceeded, indicating that the results are unreliable. In some cases, data were manually flagged based on knowledge from the field but without certain thresholds exceeded. The principle of data flagging is described below.
Data collection and data flagging. Low-cost sensors may occasionally supply inconsistent data e. g. before reaching equilibria, when they are used out of the interval of temperature or humidity operation, under other extreme conditions, or simply when sensors are being transported. As such, a procedure including quality control and quality assurance (QA/QC) of the sensor data is necessary. In the following, a set of QA/QC and filtering steps is suggested, which has been used to provide quality flags in the datasets.
In all dataset files, columns giving sensor data quality flag are available. They indicate the results of the QA/ QC procedures applied to sensor data. The data quality flags are provided for users to be able to filter sensor data in order to ensure using only robust data, or in order to test the output of their own filtering procedures compared to the one provided with the data. The name of the columns with the data flag has a format of Sensor_Flag where Sensor includes: CO_A4_P1, D300, NO_B4_P1, NO2_B43F_P1, OX_A431_P1, 5301CAT, 5301CST, 5325CAT, 5325CST, 5310CAT, 5310CST, OPCN3PM1, OPCN3PM25 or OPCN3PM10. The data flags can contain the following labels: • empty labels: indicates valid raw data after all QA/QC procedures are applied; • "W" indicates data flagged for warming up of AirSensEUR sensor system after a cold start, any reboot of the AirSensEUR sensor systems or restart of AirSensEUR data acquisition. Warming up time is required for allowing the sensor to reach the full sensor response capacity. Table 11 gives the suggested warming time for all sensor in the row "Warming"; • "T.min" or "T.max" and/or "Rh.min" or "Rh.max" indicates data outside temperature and/or relative humidity limits. These four thresholds were empirically determined, either from experience or laboratory experiments 4,17,18 . Extreme temperature and humidity may affect sensor performance resulting in inaccurate, noisy and/or questionable data. The upper and lower bounds of temperature and relative humidity were set to filter sensor data out, as sensor may behave incorrectly outside these bounds, e. g. OPC-N3 overestimating PM  www.nature.com/scientificdata www.nature.com/scientificdata/ mass concentration for high relative humidity. The suggested limits of acceptability of temperature (T.min and T.max) and relative humidity (rh.min and rh.max) are given in Table 11.
• "Low_values" or "High_values": indicates data flagged when data were lower than the minimum acceptable values (Min_values in Table 11) or higher than maximum acceptable values (Max_values in Table 11. Both "Low_values" and "High_values" corresponded to the limits due to the range of the AirSensEUR data acquisition, the operational range of the sensors or impossibilities of air pollution levels. • "OutliersMin" or "OutliersMax": indicates data flagged when applying the outlier filtering procedure. Occasional outliers in sensor data, might happen due to several reasons. The detection of outliers at all x i in dataset was performed using an Hampel filter based on the Mean Absolute Deviation MAD i , computed using Eq. 1 over a rolling time windows centred on x i including all x j values within a time Window (see Window in Table 11). Subsequently, MAD i was expanded with the Threshold factor (see Threshold in  100  100  100  100  100  100  100  100  100  100  100  100   Window, min 181  181  181  181  181  181  181  181  181  181  181  181   Threshold  75  8  75  20  20  20  20  20  20  20  20 Table 9) where the pilot studies took place in the three cities for the ten AirSensEUR sensor systems involved in the feasibility study in Ispra. www.nature.com/scientificdata www.nature.com/scientificdata/ point of outlier detection using MAD is to determine the time window such that spikes in data shall be recognized to be real or outlier in measurements. The time window was set to 3 hrs (181 data points). • "Inv" indicates sensor data flagged as invalid. A few invalid sensors were manually flagged as they corresponded to move of the sensor systems, unknown location of sampling, periods of maintenance or calibration of the reference analysers and a few malfunctions of sensors, e.g., insects in OPC, aging of chemical sensor, general failure of sensors. "Inv" is sometimes added to the flag of sensor data although sensor data are correct while because of maintenance or calibration of reference analysers, comparison of reference and sensor data should not be carried out.
For sensor data that do not satisfy two or more of the criteria listed above, the Sensor_flag consist of the concatenation of the flag labels, with a comma separation between quotes.
For the OPC-N3 sensors, the rh.max was initially set to 70%, as suggested by the manufacturer. However, based on testing it was later set to 100% in order to keep all data in case they might be used later on for calibration with Kohler models [18][19][20] . The Kohler model requires higher relative humidity than 70% for achieving the best possible fit. Several tests were performed to determine the rh.max for multi linear and Kohler fittings. The results showed that for multi-linear and Kohler fittings, the best predictions were obtained by rh.max of 70% and 100%, respectively.
All values of parameters for filtering are given in Table 11. They are mainly derived from experience (i. e. Warming, Window and Threshold). One may notice that the values of T.min, T.max, rh.min and rh.max do not discard many outliers. However, these parameters could be set to more stringent values that could be useful for filtering for example high relative humidity for PM sensors or high temperature that affect NO_B4 sensors. The Min_values and Max_values for sensors CO_A4, NO_B4, NO2_B43F and OX_A431 are constrained by the electronics of the AirSensEUR data acquisition. They should not be changed. Conversely, the Min_values and Max_values for sensors D300, PMS5003, OPC-N3, BMP280, SHT31HE, SHT31HI, SHT31TE and SHT31TI are set using expected reading and can be fine-tuned in order to discard outliers. Finally, the majority of values given in Table 11 are not absolute rules and data users can experiment with new values in order to improve the filtering procedure.

Usage Notes
For users who would like to study minute-level sensor data against minute-level reference data, some lag between sensor and reference time series can be an obstacle. Although AirSensEUR time series refers to Coordinated Universal Time drawn from GPS or GPRS or WIFI, this does not exclude different response time of LCS and reference analyser and other mistakes in reporting time of data series. Before any data treatment, it is strongly suggested to apply a lag correction for the sensor and reference data series being studied. Lag between two data series can be estimated using the output of cross correlation function (CCF) 21 . The maximum CCF can be estimated using the Find_Max_CCF function in "151016 Sensor_toolBox.R" file (https://github.com/ec-jrc/ airsenseur-calibration).